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OPTIMAL ESTIMATION OF THE MEAN FUNCTION BASED ON 
DISCRETELY SAMPLED FUNCTIONAL DATA: 
PHASE TRANSITION 

By T. Tony Cai 1 and Ming Yuan 2 

University of Pennsylvania and Georgia Institute of Technology 

The problem of estimating the mean of random functions based 
on discretely sampled data arises naturally in functional data analy- 
sis. In this paper, we study optimal estimation of the mean function 
under both common and independent designs. Minimax rates of con- 
vergence are established and easily implementable rate-optimal esti- 
mators are introduced. The analysis reveals interesting and different 
phase transition phenomena in the two cases. Under the common de- 
sign, the sampling frequency solely determines the optimal rate of 
convergence when it is relatively small and the sampling frequency 
has no effect on the optimal rate when it is large. On the other hand, 
under the independent design, the optimal rate of convergence is de- 
termined jointly by the sampling frequency and the number of curves 
when the sampling frequency is relatively small. When it is large, the 
sampling frequency has no effect on the optimal rate. Another inter- 
esting contrast between the two settings is that smoothing is neces- 
sary under the independent design, while, somewhat surprisingly, it 
is not essential under the common design. 

1. Introduction. Estimating the mean function based on discretely sam- 
pled noisy observations is one of the most basic problems in functional data 
analysis. Much progress has been made on developing estimation methodolo- 
gies. The two monographs by Ramsay and Silverman (2002, 2005) provide 
comprehensive discussions on the methods and applications. See also Ferraty 
and Vieu (2006). 

Let X(-) be a random function defined on the unit interval T = [0, 1] and 
X\, . . . , X n be a sample of n independent copies of X. The goal is to estimate 
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the mean function go(-) ■= E(X(-)) based on noisy observations from discrete 
locations on these curves: 

(1.1) Yij = Xi(Tij) + £ij , J = 1,2, ...,rrH and i = l,2, ...,n, 

where Tj are sampling points, and are independent random noise vari- 
ables with Esij = and finite second moment Eef - = <Tq < +oo. The sample 
path of X is assumed to be smooth in that it belongs to the usual Sobolev- 
Hilbert spaces of order r almost surely, such that 

(1.2) [xW(t)] 2 dt \ <+oo. 

Such problems naturally arise in a variety of applications and are typical 
in functional data analysis [see, e.g., Ramsay and Silverman (2005), Ferraty 
and Vieu (2006)]. Various methods have been proposed. However, little is 
known about their theoretical properties. 

In the present paper, we study optimal estimation of the mean func- 
tion in two different settings. One is when the observations are sampled at 
the same locations across curves, that is, T\j = T^j = ■ ■ ■ = T n j =: Tj for all 
j = l,...,m. We shall refer to this setting as common design because the 
sampling locations are common to all curves. Another setting is when the 
are independently sampled from T, which we shall refer to as indepen- 
dent design. We establish the optimal rates of convergence for estimating the 
mean function in both settings. Our analysis reveals interesting and different 
phase transition phenomena in the two cases. Another interesting contrast 
between the two settings is that smoothing is necessary under the inde- 
pendent design, while, somewhat surprisingly, it is not essential under the 
common design. We remark that under the independent design, the number 
of sampling points oftentimes varies from curve to curve and may even be 
random itself. However, for ease of presentation and better illustration of 
similarities and differences between the two types of designs, we shall as- 
sume an equal number of sampling points on each curve in the discussions 
given in this section. 

Earlier studies of nonparametric estimation of the mean function go from 
a collection of discretely sampled curves can be traced back to at least Hart 
and Wehrly (1986) and Rice and Silverman (1991) in the case of common 
design. In this setting, ignoring the temporal nature of {Tj : 1 < j < m}, the 
problem of estimating go can be translated into estimating the mean vector 
(go(Ti), ... ,go(T m ))', a typical problem in multivariate analysis. Such no- 
tions are often quickly discarded because they essentially lead to estimating 
go(Tj) by its sample mean 



(1.3) 
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based on the standard Gauss-Markov theory [see, e.g., Rice and Silverman 
(1991)]. 

Note that E(ii,-|T) = go(Tij) and that the smoothness of X implies that 
go is also smooth. It is therefore plausible to assume that smoothing is es- 
sential for optimal estimation of go. For example, a natural approach for 
estimating go is to regress Yij on Tjj nonparametrically via kernel or spline 
smoothing. Various methods have been introduced along this vein [see, e.g., 
Rice and Silverman (1991)]. However, not much is known about their the- 
oretical properties. It is noteworthy that this setting differs from the usual 
nonparametric smoothing in that the observations from the same curve are 
highly correlated. Nonparametric smoothing with certain correlated errors 
has been previously studied by Hall and Hart (1990), Wang (1996) and 
Johnstone and Silverman (1997), among others. Interested readers are re- 
ferred to Opsomer, Wang and Yang (2001) for a recent survey of existing 
results. But neither of these earlier developments can be applied to account 
for the dependency induced by the functional nature in our setting. To com- 
prehend the effectiveness of smoothing in the current context, we establish 
minimax bounds on the convergence rate of the integrated squared error for 
estimating fo- 
under the common design, it is shown that the minimax rate is of the or- 
der m~ 2r + n" 1 where the two terms can be attributed to discretization and 
stochastic error, respectively. This rate is fundamentally different from the 
usual nonparametric rate of (nm)~ 2r ^ 2r+1 ' ) when observations are obtained 
at nm distinct locations in order to recover an r times differentiable func- 
tion [see, e.g., Stone (1982)]. The rate obtained here is jointly determined 
by the sampling frequency m and the number of curves n rather than the 
total number of observations ran. A distinct feature of the rate is the phase 
transition which occurs when ra is of the order n l / 2r . When the functions 
are sparsely sampled, that is, ra = 0(n l l 2r ) 1 the optimal rate is of the or- 
der m~ 2r , solely determined by the sampling frequency. On the other hand, 
when the sampling frequency is high, that is, ra 3> n}l 2r , the optimal rate 
remains 1/n regardless of ra. Moreover, our development uncovers a sur- 
prising fact that interpolation of {(Tj,Y.j) :j = l,... ,m}, that is, estimating 
go{Tj) by Y.j, is rate optimal. In other words, contrary to the conventional 
wisdom, smoothing does not result in improved convergence rates. 

In addition to the common design, another popular sampling scheme 
is the independent design where the Ty are independently sampled from 
T ■ A natural approach is to smooth observations from each curve sepa- 
rately and then average over all smoothed estimates. However, the success 
of this two-step procedure hinges upon the availability of a reasonable es- 
timate for each individual curve. In contrast to the case of common de- 
sign, we show that under the independent design, the minimax rate for 
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estimating go is (rim) 2r /( 2r+1 ) + n 1 j which can be attained by smoothing 
{(Tij, Yij) : 1 < i < n, 1 < j < m} altogether. This implies that in the extreme 
case of m = 1, the optimal rate of estimating go is n~ 2r /^ 2r+l \ which also 
suggests the sub-optimality of the aforementioned two-step procedure be- 
cause it is impossible to smooth a curve with only a single observation. 
Similar to the common design, there is a phase transition phenomenon in 
the optimal rate of convergence with a boundary at m = n l l 2r . When the 
sampling frequency m is small, that is, m = Ofo 1 / 27 '), the optimal rate is of 
the order (nm)" 2r /' 2r+1 ' which depends jointly on the values of both m and 
n. In the case of high sampling frequency with m 3> n l / 2r , the optimal rate 
is always 1/n and does not depend on m. 

It is interesting to compare the minimax rates of convergence in the two 
settings. The phase transition boundary for both designs occurs at the same 
value, m = n l / 2r . When m is above the boundary, that is, m > n 1//2r , there is 
no difference between the common and independent designs, and both have 
the optimal rate of n~ 1 . When m is below the boundary, that is, m <C n l / 2r , 
the independent design is always superior to the common design in that it 
offers a faster rate of convergence. 

Our results connect with several observations made earlier in the litera- 
ture on longitudinal and functional data analysis. Many longitudinal studies 
follow the independent design, and the number of sampling points on each 
curve is typically small. In such settings, it is widely recognized that one 
needs to pool the data to obtain good estimates, and the two-step pro- 
cedure of averaging the smooth curves may be suboptimal. Our analysis 
here provides a rigorous justification for such empirical observations by pin- 
pointing to what extent the two-step procedure is suboptimal. The phase 
transition observed here also relates to the earlier work by Hall, Miiller and 
Wang (2006) on estimating eigenfunctions of the covariance kernel when the 
number of sampling points is either fixed or of larger than n 1//4+<5 for some 
5 > 0. It was shown that the eigenfunctions can be estimated at the rate of 
ra -4 / 5 in the former case and 1/n in the latter. We show here that estimating 
the mean function has similar behavior. Furthermore, we characterize the 
exact nature of such transition behavior as the sampling frequency changes. 

The rest of the paper is organized as follows. In Section 2 the optimal 
rate of convergence under the common design is established. We first de- 
rive a minimax lower bound and then show that the lower bound is in fact 
rate sharp. This is accomplished by constructing a rate-optimal smooth- 
ing splines estimator. The minimax upper bound is obtained separately for 
the common fixed design and common random design. Section 3 considers 
the independent design and establishes the optimal rate of convergence in 
this case. The rate-optimal estimators are easily implementable. Numerical 
studies are carried out in Section 4 to demonstrate the theoretical results. 
Section 5 discusses connections and differences of our results with other 
related work. All proofs are relegated to Section 6. 
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2. Optimal rate of convergence under common design. In this section we 
consider the common design where each curve is observed at the same set of 
locations {Tj : 1 < j < m}. We first derive a minimax lower bound and then 
show that this lower bound is sharp by constructing a smoothing splines 
estimator that attains the same rate of convergence as the lower bound. 

2.1. Minimax lower bound. Let V(r; Mq) be the collection of probability 
measures for a random function X such that its sample path is r times 
differentiable almost surely and 



for some constant Mq > 0. Our first main result establishes the minimax 
lower bound for estimating the mean function over V(r\ Mq) under the com- 
mon design. 

Theorem 2.1. Suppose the sampling locations are common in model 
(1-1). Then there exists a constant d>0 depending only on Mq and the 
variance <Tq of measurement error eij such that for any estimate g based on 
observations {(Tj,Yij) :1 < i< n, 1 < j < m}, 

(2.2) limsup sup P(\\g- go\\l 2 >d(m- 2r + n~ 1 ))>0. 

n^oo C(X)eP(r;M ) 

The lower bound established in Theorem 2.1 holds true for both common 
fixed design where Tj 's are deterministic, and common random design where 
TVs are also random. The term m~ 2r in the lower bound is due to the 
deterministic approximation error, and the term n _1 is attributed to the 
stochastic error. It is clear that neither can be further improved. To see this, 
first consider the situation where there is no stochastic variation and the 
mean function qq is observed exactly at the points Tj, j = 1, . . . ,m. It is well 
known [see, e.g., DeVore and Lorentz (1993)] that due to discretization, it 
is not possible to recover go at a rate faster than m~ 2r for all go such that 
J[9q^] 2 < -Ma. On the other hand, the second term n~ 1 is inevitable since 
the mean function gQ cannot be estimated at a faster rate even if the whole 
random functions X\ , . . . , X n are observed completely. We shall show later 
in this section that the rate given in the lower bound is optimal in that it is 
attainable by a smoothing splines estimator. 

It is interesting to notice the phase transition phenomenon in the minimax 
bound. When the sampling frequency m is large, it has no effect on the rate 
of convergence, and go can be estimated at the rate of 1/n, the best pos- 
sible rate when the whole functions were observed. More surprisingly, such 
saturation occurs when m is rather small, that is, of the order n 1 / 2r '. On the 
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other hand, when the functions are sparsely sampled, that is, m = 0(n 1 / 2r ), 
the rate is determined only by the sampling frequency m. Moreover, the rate 
m~ 2r is in fact also the optimal interpolation rate. In other words, when the 
functions are sparsely sampled, the mean function go can be estimated as 
well as if it is observed directly without noise. 

The rate is to be contrasted with the usual nonparametric regression 
with nm observations at arbitrary locations. In such a setting, it is well 
known [see, e.g., Tsybakov (2009)] that the optimal rate for estimating go is 
(mn)~ 2r / <y2r+1 \ and typically stochastic error and approximation error are 
of the same order to balance the bias-variance trade-off. 



2.2. Minimax upper bound: Smoothing splines estimate. We now con- 
sider the upper bound for the minimax risk and construct specific rate op- 
timal estimators under the common design. These upper bounds show that 
the rate of convergence given in the lower bound established in Theorem 2.1 
is sharp. More specifically, it is shown that a smoothing splines estimator 
attains the optimal rate of convergence over the parameter space V{r;Mo). 

We shall consider a smoothing splines type of estimate suggested by Rice 
and Silverman (1991). Observe that / i— > J[f^] 2 is a squared semi-norm 
and therefore convex. By Jensen's inequality, 

(2.3) J [gp \t)} 2 alt <E J [X {r \t)} 2 alt < oo , 

which implies that go belongs to the rth order Sobolev-Hilbert space, 

W r 2 ([0,l]) = {g:[0,l]^Jl\g,gW,...,g^ 

are absolutely continuous and g( r ' € AjQO, 1])}. 

Taking this into account, the following smoothing splines estimate can be 
employed to estimate go : 

(2.4) 5A = argmin| — ^^(Y^ - g(T 3 )) 2 + X [ [g^ (t)] 2 alt] , 

where A > is a tuning parameter that balances the fidelity to the data and 
the smoothness of the estimate. 

Similarly to the smoothing splines for the usual nonparametric regres- 
sion, g\ can be conveniently computed, although the minimization is taken 
over an infinitely-dimensional functional space. First observe that g\ can be 
equivalently rewritten as 

(2.5) 5A = argmin|lf;(y i - 5 (r i )) 2 + A / [g^{t)] 2 alt\. 
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Appealing to the so-called representer theorem [see, e.g., Wahba (1990)], the 
solution of the minimization problem can be expressed as 

r— 1 m 

(2.6) g x (t) = J2 d kt k + ^2c i K(t,T J ) 

k=0 j=l 

for some coefficients do, ... , d r _i, ci, . . . , c m , where 

(2.7) K(s,t) = -L Br (s)B r (t) - j^y B 2r (\s - t\), 

where B m {-) is the mth Bernoulli polynomial. Plugging (2.6) back into 
(2.5), the coefficients and subsequently g\ can be solved in a straightfor- 
ward way. This observation makes the smoothing splines procedure easily 
implementable. The readers are referred to Wahba (1990) for further details. 

Despite the similarity between g\ and the smoothing splines estimate 
in the usual nonparametric regression, they have very different asymptotic 
properties. It is shown in the following that g\ achieves the lower bound 
established in Theorem 2.1. 

The analyses for the common fixed design and the common random design 
are similar, and we shall focus on the fixed design where the common sam- 
pling locations T%, . . . , T m are deterministic. In this case, we assume without 
loss of generality that T\ < T 2 < • • • < T m . The following theorem shows that 
the lower bound established in Theorem 2.1 is attained by the smoothing 
splines estimate g\. 

Theorem 2.2. Consider the common fixed design and assume that 

(2.8) max |2i+i — TA < Com~ l 

0<j<m 

for some constant Cq > where we follow the convention that Tq = and 
T m+1 = 1. Then 

(2.9) lim limsup sup P{\\g x - go\\c 2 > D{m~ 2r + n' 1 )) = 

D^oo n ^oo C(X)£V(r;M ) 

for any A = 0(m~ 2r + n _1 ). 

Together with Theorem 2.1, Theorem 2.2 shows that g\ is minimax rate 
optimal if the tuning parameter A is set to be of the order 0{m~ 2r + n~ l ). We 
note the necessity of the condition given by (2.8). It is clearly satisfied when 
the design is equidistant, that is, Tj = 2j/(2m + 1). The condition ensures 
that the random functions are observed on a sufficiently regular grid. 

It is of conceptual importance to compare the rate of g\ with those gener- 
ally achieved in the usual nonparametric regression setting. Defined by (2.4), 



8 



T. T. CAI AND M. YUAN 



g\ essentially regresses Yij on Tj. Similarly to the usual nonparametric re- 
gression, the validity of the estimate is driven by K(Yij\Tj) = go(Tj). The 
difference, however, is that Yn,...,Yi m are highly correlated because they 
are observed from the same random function -Xj(-). When all the Y^s are 
independently sampled at Tj's, it can be derived that the optimal rate for 
estimating g$ is m~ 2r + (mn)~ 2r ^ 2r+1 \ As we show here, the dependency 
induced by the functional nature of our problem leads to the different rate 

— 2r i —1 

m + n . 

A distinct feature of the behavior of g\ is in the choice of the tuning 
parameter A. Tuning parameter selection plays a paramount role in the 
usual nonparametric regression, as it balances the the tradeoff between bias 
and variance. Optimal choice of A is of the order (mii)" 2r /' 2r+1 ' in the 
usual nonparametric regression. In contrast, in our setting, more flexibility 
is allowed in the choice of the tuning parameter in that g\ is rate optimal 
so long as A is sufficiently small. In particular, taking A — > + , g\ reduces to 
the splines interpolation, that is, the solution to 

(2.10) min l[g [r) {t)\ 2 subject to g(Tj) = Y.j, j = l,...,m. 

This amounts to, in particular, estimating go(Tj) by Y.j. In other words, 
there is no benefit from smoothing in terms of the convergence rate. However, 
as we will see in Section 4, smoothing can lead to improved finite sample 
performance. 

Remark. More general statements can also be made without the con- 
dition on the spacing of sampling points. More specifically, denote by 

R(Ti T m ) = max \T j+1 - Tj \ 

j 

the discretization resolution. Using the same argument, one can show that 
the optimal convergence rate in the minimax sense is R 2r + n _1 and g\ is 
rate optimal so long as A = 0(R 2r + n _1 ). 

Remark. Although we have focused here on the case when the sampling 
points are deterministic, a similar statement can also be made for the setting 
where the sampling points are random. In particular, assuming that Tj are 
independent and identically distributed with a density function r/ such that 
inf tg 7-ry(t) > co > and #o £ WJo, it can be shown that the smoothing splines 
estimator g\ satisfies 

(2.11) lim limsup sup P(\\g x - go\\c 2 > D(m~ 2r + n" 1 )) = 

D^oo n ^oo C(X)eV(r;M ) 

for any A = 0(m~ 2r + n" 1 ). In other words, g\ remains rate optimal. 
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3. Optimal rate of convergence under independent design. In many ap- 
plications, the random functions Xi are not observed at common locations. 
Instead, each curve is discretely observed at a different set of points [see, 
e.g., James and Hastie (2001), Rice and Wu (2001), Diggle et al. (2002), 
Yao, Miiller and Wang (2005)]. In these settings, it is more appropriate to 
model the sampling points as independently sampled from a common dis- 
tribution. In this section we shall consider optimal estimation of the mean 
function under the independent design. 

Interestingly, the behavior of the estimation problem is drastically differ- 
ent between the common design and the independent design. To keep our 
treatment general, we allow the number of sampling points to vary. Let m 
be the harmonic mean of mi, . . . , m n , that is, 



Denote by Ai(m) the collection of sampling frequencies (mi, . . . ,m n ) whose 
harmonic mean is m. In parallel to Theorem 2.1, we have the following 
minimax lower bound for estimating g$ under the independent design. 

Theorem 3.1. Suppose Tij are independent and identically distributed 
with a density function i] such that inft g 7- n(t) > cq > 0. Then there exists a 
constant d > depending only on Mq and <7q such that for any estimate g 
based on observations {(Tij,Yij) :l<i<n,l<j< m}, 



(3.1) limsup sup P{\\g-g \\ 2 C2 >d((nm)- 2r / {2r+ V +n~ 1 ))>0. 



n^oo C(X)<=T(r;M ) 

(mi,...,m„)£M(m) 

The minimax lower bound given in Theorem 3.1 can also be achieved 
using the smoothing splines type of estimate. To account for the different 
sampling frequency for different curves, we consider the following estimate 
of go: 



Theorem 3.2. Under the conditions of Theorem 3.1, i/A - (nm) 2r /( 2r + 1 ) ; 
then the smoothing splines estimator g\ satisfies 

lim limsup sup P(\\g - g \\l 2 > D({nm)- 2r/[2r+l) + n" 1 )) 

D->oo n ^oo C(X)eV(r;M ) 
(mi,...,m„)6M(m) 




(3.2) g x = argmin< - 
sewr In 




(3.3) 



= 0. 



In other words, g\ is rate optimal. 
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Theorems 3.1 and 3.2 demonstrate both similarities and significant differ- 
ences between the two types of designs in terms of the convergence rate. For 
either the common design or the independent design, the sampling frequency 
only plays a role in determining the convergence rate when the functions are 
sparsely sampled, that is, m = 0(n 1 ^ 2r ). But how the sampling frequency 
affects the convergence rate when each curve is sparsely sampled differs be- 
tween the two designs. For the independent design, the total number of 
observations mn, whereas for the common design m alone, determines the 
minimax rate. It is also noteworthy that when m = 0(n 1//2r ), the optimal 
rate under the independent design, (mn)~ 2r ' ( 2r + 1 ) ) is the same as if all the 
observations are independently observed. In other words, the dependency 
among Yn, . . . , Y^ m does not affect the convergence rate in this case. 



Remark. We emphasize that Theorems 3.1 and 3.2 apply to both de- 
terministic and random sampling frequencies. In particular for random sam- 
pling frequencies, together with the law of large numbers, the same minimax 
bound holds when we replace the harmonic mean by 

/ n 

when assuming that m^'s are independent. 




3.1. Comparison with two-stage estimate. A popular strategy to handle 
discretely sampled functional data in practice is a two-stage procedure. In 
the first step, nonparametric regression is run for data from each curve to 
obtain estimate Xi of Xi, % = 1, 2, . . . , n. For example, they can be obtained 
by smoothing splines 

(3.4) A > i , A = argmin(lf;(y iJ -< 7 (T y )) 2 + A / [f^(t)] 2 dt\. 

/ewj JT J 

Any subsequent inference can be carried out using the X^\ as if they were 
the original true random functions. In particular, the mean function g$ can 
be estimated by the simple average 

1 n 

(3.5) ~gx = -Y^x i)X . 

i=i 

Although formulated differently, it is worth pointing out that this procedure 
is equivalent to the smoothing splines estimate g\ under the common design. 
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Proposition 3.3. Under the common design, that is, T{j = Tj for 1 < 
i <n and 1 < j < m. The estimate g\ from the two-stage procedure is equiv- 
alent to the smoothing splines estimate g\ : g\ = g\. 

In light of Theorems 2.1 and 2.2, the two-step procedure is also rate 
optimal under the common design. But it is of great practical importance to 
note that in order to achieve the optimality, it is critical that in the first step 
we under smooth each curve by using a sufficiently small tuning parameter. 

Under independent design, however, the equivalence no long holds. The 
success of the two-step estimate g\ depends upon getting a good estimate of 
each curve, which is not possible when m is very small. In the extreme case 
of m = 1, the procedure is no longer applicable, but Theorem 3.2 indicates 
that smoothing splines estimate g\ can still achieve the optimal convergence 
rate of n~ 2r ^ 2r+1 \ The readers are also referred to Hall, Miiller and Wang 
(2006) for discussions on the pros and cons of similar two-step procedures 
in the context of estimating the functional principal components. 

4. Numerical experiments. The smoothing splines estimators are easy 
to implement. To demonstrate the practical implications of our theoretical 
results, we carried out a set of simulation studies. The true mean function 
go is fixed as 

50 

(4.1) 50 = E 4 (- 1 ) fc+1 ^' 

fc=i 

where <fi\(t) = 1 and (j)j,+i(t) = v2 cos(/c7ri) for k > 1. The random function 
X was generated as 

50 

(4.2) X = g + Y J (kZ k (f> k , 

k=l 

where Z k are independently sampled from the uniform distribution on 
[—s/3,y/3], and Cfc are deterministic. It is not hard to see that £| are the 
eigenvalues of the covariance function of X and therefore determine the 
smoothness of a sample curve. In particular, we take £fc = (— l) fc+1 /c -11 / 2 . 
It is clear that the sample path of X belongs to the second order Sobolev 
space (r = 2). 

We begin with a set of simulations designed to demonstrate the effect of 
interpolation and smoothing under common design. A data set of fifty curves 
were first simulated according to the aforementioned scheme. For each curve, 
ten noisy observations were taken at equidistant locations on each curve fol- 
lowing model (1.1) with ctq = 0.5 2 . The observations, together with g$ (grey 
line), are given in the right panel of Figure 1. Smoothing splines estimate g\ 
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is also computed with a variety of values for A. The integrated squared error, 
\\g\ — go\\c 2 > as a function of the tuning parameter A is given in the left panel. 
For A smaller than 0.1, the smoothing splines estimate essentially reduces to 
the spline interpolation. To contrast the effect of interpolation and smooth- 
ing, the right panel also includes the interpolation estimate (solid black line) 
and g\ (red dashed line) with the tuning parameter chosen to minimize the 
integrated squared error. We observe from the figure that smoothing does 
lead to slightly improved finite sample performance although it does not 
affect the convergence rate as shown in Section 2. 

The next numerical experiment intends to demonstrate the effect of sam- 
ple size n, sampling frequency m as well as design. To this end, we simulated 
n curves, and from each curve, m discrete observations were taken follow- 
ing model (1.1) with <Tq = 0.5 2 . The sampling locations are either fixed at 
Tj = (2j)/(2m + 1), j = 1, . . . ,m, for common design or randomly sampled 
from the uniform distribution on [0,1]. The smoothing splines estimate g\ 
for each simulated data set, and the tuning parameter is set to yield the 
smallest integrated squared error and therefore reflect the best performance 




Fig. 1. Effect of smoothing under common design: for a typical data set with fifty curves, 
ten observations were taken on each curve. The observations and go (solid grey line) are 
given in the right panel together with the spline interpolation estimate (solid black line) 
and smoothing splines estimate (red dashed line) with the tuning parameter chosen to yield 
the smallest integrated squared error. The left panel gives the integrated squared error of 
the smoothing splines estimate as a function of the tuning parameter. It is noted that the 
smoothing splines estimate essentially reduces to the spline interpolation for A smaller 
than 0.1. 
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of the estimating procedure for each data set. We repeat the experiment 
with varying combinations of n = 25, 50 or 200, m = 1, 5, 10 or 50. For the 
common design, we restrict to m = 10 or 50 to give more meaningful com- 
parison. The true function go as well as its estimates obtained in each of the 
settings are given in Figure 2. 

Figure 2 agrees pretty well with our theoretical results. For instance, in- 
creasing either m or n leads to improved estimates, whereas such improve- 
ment is more visible for small values of m. Moreover, for the same value of 
m and n, independent designs tend to yield better estimates. 

To further contrast the two types of designs and the effect of sampling 
frequency on estimating go, we now fix the number of curves at n = 100. For 
the common design, we consider m = 10, 20, 50 or 100. For the independent 
design, we let m = 1, 5, 10, 20, 50 or 100. For each combination of (n, m), two 
hundred data sets were simulated following the same mechanism as before. 
Figure 3 gives the estimation error averaged over the one hundred data sets 
for each combination of (n,m). It clearly shows that independent design 
is preferable over common design when m is small; and the two types of 
designs are similar when m is large. Both phenomena are in agreement with 
our theoretical results developed in the earlier sections. 



Common Design (n=25) Independent Design (n=25) 




0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 

T T 



n=50 n=50 




0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 

T T 



n=200 




n=200 




Fig. 2. Effect ofm, n and type of design on estimating go: smoothing splines estimates 
obtained under various combinations are plotted together with go- 
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Fig. 3. Effect of design type and sampling frequency on estimating go: the black solid 
line and circles correspond to common design whereas the red dashed lines and circles 
correspond to independent design. The error bars correspond to the average ± one standard 
errors based on two hundred repetitions. Note that both axes are in log scale to yield better 
comparison. 

5. Discussions. We have established the optimal rates of convergence for 
estimating the mean function under both the common design and indepen- 
dent design. The results reveal several significant differences in the behavior 
of the minimax estimation problem between the two designs. These revela- 
tions have important theoretical and practical implications. In particular, 
for sparsely sampled functions, the independent design leads to a faster rate 
of convergence when compared to the common design and thus should be 
preferred in practice. 

The optimal rates of convergence for estimating the mean function based 
on discretely sampled random functions behave in a fundamentally different 
way from the minimax rate of convergence in the conventional nonparametric 
regression problems. The optimal rates in the mean function estimation are 
jointly determined by the sampling frequency m and the number of curves 
n rather than the total number of observations mn. 

The observation that one can estimate the mean function as well as if the 
the whole curves are available when m 3> ri 1//2r bears some similarity to some 
recent findings on estimating the covariance kernel and its eigenfunction 
under independent design. Assuming that X is twice differentiable (i.e., 
r = 2), Hall, Miiller and Wang (2006) showed that when m 3> n 1 / 4 " 1 "" 5 for 
some 5 > 0, the covariance kernel and its eigenfunctions can be estimated at 
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the rate of 1/n when using a two-step procedure. More recently, the cutoff 
point is further improved to n 1 / 2r logn with general r by Cai and Yuan 
(2010) using an alternative method. Intuitively one may expect estimating 
the covariance kernel to be more difficult than estimating the mean function, 
which suggests that these results may not be improved much further for 
estimating the covariance kernel or its eigenfunctions. 

We have also shown that the particular smoothing splines type of estimate 
discussed earlier by Rice and Silverman (1991) attains the optimal conver- 
gence rates under both designs with appropriate tuning. The smoothing 
splines estimator is well suited for nonparametric estimation over Sobolev 
spaces. We note, however, other nonparametric techniques such as kernel or 
local polynomial estimators can also be used. We expect that kernel smooth- 
ing or other methods with proper choice of the tuning parameters can also 
achieve the optimal rate of convergence. Further study in this direction is 
beyond the scope of the current paper, and we leave it for future research. 

Finally, we emphasize that although we have focused on the univariate 
Sobolev space for simplicity, the phenomena observed and techniques devel- 
oped apply to more general functional spaces. Consider, for example, the 
multivariate setting where T = [0, l] d . Following the same arguments, it can 
be shown that the minimax rate for estimating an r-times differentiable func- 
tion is m~ 2r l d + n~ l under the common design and (nm)" 2r ^ r+<i ' + n _1 
under the independent design. The phase transition phenomena thus re- 
main in the multidimensional setting under both designs with a transition 
boundary of m = n d / 2r . 



6. Proofs. 



Proof of Theorem 2.1. Let V be the collection all measurable func- 
tions of {(Tij , Yij) :1 <i <n, 1 < j < m}. First note that it is straightforward 
to show that 



limsupinf sup P(\\g — go\\ 2 £ 2 > dn 1 )>0 

n-Kx> geV C(X)eV(r;M ) 

by considering X as an unknown constant function where the problem es- 
sentially becomes estimating the mean from n i.i.d. observations, and 1/n 
is known as the optimal rate. It now suffices to show that 

limsup inf sup P(\\g — 9o\\c 2 > dm~ 2r ) > 0. 
n^oo geT> C{X)eV(r;M ) 



Let <£>i,..., ip2m be 2m functions from W2 with distinct support, that is, 

- / 
~h 



<p k (-) = h r K[—^), k = l,...,2m, 
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where h = l/(2m), = 1) /2m + l/4m, and -fT : R — >• [0, oo) is an r 
times differentiable function with support [—1/2, 1/2]. See Tsybakov (2009) 
for explicit construction of such functions. 
For each b = (&i, . . . , b 2m ) G {0, l} 2m , define 

2m 

fc=i 

It is clear that 

min H ~ 9b ' 11 ^ = y k \\l = (2m)-^ +1 )||K||L. 
H(b,b')>i H(b,b>) "^ fe|l£2 v ; 11 nCi 

The claim then follows from an application of Assouad's lemma [Assouad 
(1983)]. □ 

Proof of Theorem 2.2. It is well known [see, e.g., Green and Silver- 
man (1994)] that g\ can be characterized as the solution to the following: 

min / [# (r) (t)] 2 dt subject to g(Tj) =g\(Tj), j = l,..., m. 

Write 

Sj =g\(Tij) - go(Tij), 
and let h be the linear interpolation of {(Tj,5j) :l<j< m}, that is, 
( £i, 0<t<Ti, 



(6.1) h(t) = < 



S j rp + S.j+i— J — , Tj<t< T j+1 , 

,5 m , T m <t< 1. 



Then g\ = Qt(9o + h) where Qt be the operator associated with the rth 
order spline interpolation, that is, Qt(/) is the solution to 

m }J} r I \9 {r) it)f dt subject to g(Tj) = f(Tj), j = l,...,m. 

Recall that Qt is a linear operator in that Qt(/i + /2) = Qt(/i) +Qt(/2) 
[see, e.g., DeVore and Lorentz (1993)]. Therefore, #a = Qt(9o) + Qr(h). By 
the triangular inequality, 

(6-2) H3-S0IU2 < HQt(5o) -50IU2 + IIQtWIUs- 

The first term on the right-hand side represents the approximation error of 
spline interpolation for go, and it is well known that it can be bounded by 
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see, e.g., DeVore and Lorentz (1993)] 
\\Qt(9o) -9o\\c 2 

<co(max \T j+1 -Tj\ 2r ) [ [g { r) (t)] 2 dt 

\0<j<m J Jj- 



(6.3) 



< coM^m 



-2r 



Hereafter, we shall use Co > as a generic constant which may take different 
values at different appearance. 

It now remains to bound ||Qr(^)IU 2 - We appeal to the relationship be- 
tween spline interpolation and the best local polynomial approximation. Let 

Ij = [Tj_ r +i, Tj+ r ] 

with the convention that Tj = for j < 1 and Tj = 1 for j > m. Denote by 
Pj the best approximation error that can be achieved on Ij by a polynomial 
of order less that r, that is, 

2 



(6.4) 



P j (f)= min I 

a k : k<rjj 



r-1 



5> fe t fc -/(t) 



fe=0 



(It. 



It can be shown [see, e.g., Theorem 4.5 on page 147 of DeVore and Lorentz 
(1993)] that 

\\f-Q T (f)\\l 2 <c (^P 3 (f) 2 ^. 



Then 

(m 
3=1 

Together with the fact that 



1/2 



P 



(hf < [ h(tf 
Jli 



dt, 



we have 



\\Q T (h)\\l 2 < c \\h\\l 2 



<c J2 tf(Tj+i ~ Tj-!) < com- 1 S 



3=1 



3=1 



c m 1 ^ [g x (Tj)- g (Tj ) } ' 
3=1 
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< com" 1 - g\(Tj)] 2 + [Y.j - g (Tj)] 2 ). 

3=1 



Observe that 



E l^rri- 1 Y^Yj ~ 9o(Tj)} 2 j = CQ(J 2 n 



2„-l 



It suffices to show that 



m 
3=1 

To this end, note that by the definition of g\, 

- £(*i ~ &(2i)) 2 ^ - " 5a(^)) 2 + A / [5i r) (*)f dt 

ill' »it J7" 

1 m /" 

<-E^-5o(^)) 2 +a / [i\t)] 2 dt 

171 3=1 JT 

<O p (m- 2r + H- 1 ), 
because A = 0{m~ 2r + n~ l ). The proof is now complete. □ 

Proof of Theorem 3.1. Note that any lower bound for a specific case 
yields immediately a lower bound for the general case. It therefore suffices to 
consider the case when X is a Gaussian process and m\ = ni2 = ■ ■ ■ = m n =: 
m. Denote by N = c{nm) l ^ 2r+l ^ where c > is a constant to be specified 
later. Let b = {b\, . . . , 6/v) £ {0, 1}^ be a binary sequence, and write 

2N 

g b (-) = M 1 / \- r ]T N-Wk-'h-NMU, 

k=N+l 

where ifk(t) = \/2cos(irkt). It is not hard to see that 

« 2N 

/ [gt\t)] 2 dt = M^ (^r(N-^k- r k 

'' T k>N+l 

2N 

= M N~ 1 h-N<M . 



k-N) 



k=N+l 
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Furthermore, 

27V 

H-gvWl^M^N- 1 Y, k- 2r (b k - N -b' k _ N ) 2 

k=N+l 

27V 

>M vr- 2 '-(2iV)-( 2 ^ 1 ) £ (b k _ m - b' k _ m ) 2 

k=N+l 

= c Q N-^ r+ ^H{b,b') 

for some constant cq > 0. By the Varshamov-Gilbert bound [see, e.g., Tsy- 
bakov (2009)], there exists a collection of binary sequences {b^\ . . . , b^ M '} C 
{0, 1}^ such that M > 2 N / 8 , and 

H , b {k) ) > N/8 yi<j<k< M. 

Then 

hbU) -9bw\\c 2 - c o N ~ V - 

Assume that X is a Gaussian process with mean <#,, T follows a uni- 
form distribution on T and the measurement error e ~ iV(0, Oq). Condi- 
tional on {Tij :j = l,... , m}, Z^. = (Zn, . . . , Zi m )' follows a multivariate nor- 
mal distribution with mean \ib = (gb(Tn), ■ ■ ■ ,gb(Tim))' an d covariance ma- 
trix E(T) = (Co(Tjj, Ti/ C )) 1 <j j / C < m -\-OqI. Therefore, the Kullback-Leibler dis- 
tance from probability measure ^ to H g ^ can be bounded by 

KL ( n 9 6 0) \ U 9 b (k) ) = n M{»9 b(j) ~ V9 b(k) )'^HT)^g bU) ~ Hg bW )] 

< Tk7o 2 Er||/i ffiW) -^ 6(fe) || 2 

= nma^ 2 \\g bU ) -g b (k)\\c 2 

< cinmoQ 2 N~ 2r . 
An application of Fano's lemma now yields 

nr ll~ II ^ Kr-r(<i ^(cmmaf N~ 2r ) + \og2 \ 

max E , o - o h N) r„ > co-N 1 — — 

l<j<M W^ll£ 2 - ^ \ogM J 

X ( m )- r /( H1 ' 

with an appropriate choice of c, for any estimate g. This in turn implies that 

limsup inf sup P(\\g - g \\l 2 > d(nm)" 2r/(2r+1) ) > 0. 
n->oo gev C(X)eV{r;M ) 

The proof can then be completed by considering X as an unknown constant 
function. □ 
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Proof of Theorem 3.2. For brevity, in what follows, we treat the 
sampling frequencies mi,...,m n as deterministic. All the arguments, how- 
ever, also apply to the situation when they are random by treating all the 
expectations and probabilities as conditional on m\, . . . ,m n . Similarly, we 
shall also assume that Tj's follow uniform distribution. The argument can 
be easily applied to handle more general distributions. 

It is well known that WJ, endowed with the norm 



M\2 



(6-5) \\f\\ 2 W r = //* + /(/ 

forms a reproducing kernel Hilbert space [Aronszajn (1950)]. Let %$ be the 
collection of all polynomials of order less than r and "Hi be its orthogo- 
nal complement in ■ Let :1 < k < r} be a set of orthonormal basis 
functions of Ho, and : k > r} an orthonormal basis of Hi such that any 
/ G admits the representation 

/=E-/^- 

V>1 



Furthermore, 



l 2 =E/* and n/iivvj = E( 1 +^ 1 )/' 



U>1 U>1 



where p\ = ■ ■ ■ = p r = +00 and p v ^v r ' . 
Recall that 



argmhJ £ mn (g) + A / 
geH(K) I J 



g (r) } 2 



gGH(K) 

where 

mi 



«s) = -E-E(^-^)) 2 - 

i=i % j=i 

For brevity, we shall abbreviate the subscript of g hereafter when no confu- 
sion occurs. Write 

Ms)=E -E-E^'-^)] 2 
\ i=i 1 j=i / 

= E([Y n - go(Tn)} 2 ) + j [g(s) - g (s)} 2 ds. 



Let 



g = arg mm 
geH(K) 



too(g) + A [g^f 
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Denote 

imn,x(g) = i mn {g) + \J [g {r) ] 2 ; 4o,a(s) = Ms) + A J [g^f. 

Let 

g = g- \G^ l Dl mn ,x{g), 

where G\ = (l/2)D 2 £ 00j \(g) and D stands for the Frechet derivative. It is 
clear that 

g - go = (g - go) + (g - g) + (g - §)■ 

We proceed by bounding the three terms on the right-hand side separately. 
In particular, it can be shown that 

(6.6) \\g-go\\ 2 c 2 <coxJ[g { r) } 2 

and 

(6.7) ||s - g\\h = P {n~ l + (nm)"^" 1 '^)). 
Furthermore, if 

nmAW -> oo, 

then 

(6.8) \\g - g\\ 2 C2 = o^n" 1 + (nm^A" 1 ^). 
Therefore, 

IIS " 9o\\l 2 = O p (X + n~ l + (nmJ^A- 1 ^)). 

Taking 

Ax(nm)- 2f /( 2r+1 ) 

yields 

\\g-go\\l 2 =O p {n^ + {nm)-^l^). 

We now set to establish bounds (6.6)-(6.8). For brevity, we shall assume 
in what follows that all expectations are taken conditionally on mi, . . . ,m n 
unless otherwise indicated. Define 

(6.9) \\g\\l = ^ l + Pu l ) a 9l 

where < a < 1 . 
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We begin with g — go- Write 
(6.10) 0)Q = 00 = 



fc>i 



fc>i 



Then 
(6.11) 



4o(<?) = E([Fn - 5o(Tn)] 2 ) + - a fc ) 2 . 



fc>i 



It is not hard to see 



(6.12) 6 fc := (g,4>k)c 2 = argmin{(6 fe - a k ) + Xp k b k } = 1 . 
Hence, 



i«=E( 1+ ^ 1 ) a ( 5 *- a *) 2 
fe>i 



fe>l 



< c A 2 sup 



(l+a) oo 



a 2 . 



k >i (1 + Ap fc j k=1 
i-a /r»i 2 



E-l 2 



Next, we consider Notice that Di mn ^{g) = D£ mnjX {g) - Dl^^g) 

D£mn(g) ~ D£oo(g). Therefore 

E[D£ mn>x (g)f] 2 = E[D£ mn (g)f - Dl^g)/} 2 



n 

V ^Var 



i=i 1 



Y.o\, <i"\,)jiT,,)) 

3=1 



Note that 



Var 



Var 



Var 



+ E 



Var Hr^/^OIT 



^=1 



E([5o(^')-9(^)]/(^)) 
j'=l 



+ E 



Var Jyy/^KT 
Vj=i y 
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" rrii 

Var Y,([9o(Ti d )-9mj)\fmj)) 

= m i Vax(\g (T il )-g(T il )]f{T il )) 
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<mmm{T a )-g{Ta)]f{Tii)f 
= m l J^[g (t)-g(t)]f(t)) 2 dt 

<mi [ \go(t) - g(t)} 2 dt [ f(t)dt, 



T 



T 



where the last inequality follows from the Cauchy-Schwarz inequality. To- 
gether with (6.6), we get 



(6.13) 



Var 



< com, 



A. 



^(bo(Tij) — g(Tij)]f(Tij)) 

We now set out to compute the the second term. First observe that 

(mi \ 

mi 

= ^ f( T ij)f( T ik)(Co(Tij,T ik ) + aQ5jk), 
j,k=i 

where 6jk is Kronecker's delta. Therefore, 



E 



Vj=i j 



mi(mi - 1) 



f(s)C (s,t)f(t)dsdt 



TxT 



+ rrna 2 \\f\\i 2 + m, J f\s)C{s, s) ds. 



Summing up, we have 
(6.14) 
where 

(6.15) c k 



E[Dl mn , x (g)<Pk] 2 < ^ — )+ — > 



4>k(s)C (s,t)(p k (t) ds dt. 



TxT 
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-G^De nm> x(g) 



2(1 + p^Hl + \ P ^)- 2 {Dl nmtX {g)^ k 

k>l 

\i=l V fc>l 

+ ^E( 1 +^ 1 H 1+A ^ 1 rv 



k>l 



Observe that 



l/(2r-) 



fc>l 



and 



/c>i 



fc>i 



Thus, 



< 



co 



n 2 I 4^ rrii J 



V(2r) + A 



It remains to bound g — g. It can be easily verified that 
(6.16) g-~g= lG^ 1 [D 2 £ 00 (g)(g - g) - D 2 £ mn (g)(g - g)\. 

Then 

Il5-5|| 2 =2( 1 +^ 1 ) Q ( 1 + W)" 2 



fe>l 



1 n 1 



2 — 5^(5 ( T u) - g( T ij))M T ij) 



i=i j=i 



r 



(<K S ) - g(s))4>k(s)ds 



Clearly, (5 — <?)0fc G Write 

(6.17) {g-g)<t>k = ^h j (j) j . 

J>1 
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j n j mi „ 

- V — y"(K^) - giTij^MTij) - / ($(*) - <fc 

3=1 

n . mi . N 

-Y—Y^^-J Ms) ds 



i=l j=l 



< 



fci>i 



< 



jti>i 



i=l j=l 



1 " 1 



Lfci>l \ i=l j=l 



T 



Ai (s) ds 



for any < 7 < 1, where in the last inequality, we used the fact that 

(6.18) HQ - g)Mi < \\9 ~ ahUkh = \\g - ffll 7 (i + p~k l V' 2 - 

Following a similar calculation as before, it can be shown that 



(1 . w . 1 I"* f 
i=l 1 j=l J 1 

\i=l V fci>l 

which is finite whenever 7 > l/2r. Recall that m is the harmonic mean of 
mi, . . . , m n . Therefore, 



(s) cis 



(6.19) 



— 1 1 2 



1 



nmA Q+ T +1 /(*') 
If a > l/2r, then taking 7 = yields 

1 



(6.20) 

assuming that 
(6.21) 



\i = o P 



nmA 2a+l/(2r) 

nm\ 2a+l '^ -> 00. 
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Together with the triangular inequality 

(6.22) \\g-g\\ a > \\g-g\\ a - \\g-g\\ a = (1 - o p {i))\\g - g\\ a . 

Therefore, 

(6.23) ||s - g\\l = p (\\g - g\\l) = O^n" 1 + H 4 ^* 1 ). 
Together with (6.19), 

'* " ^ ~ ^(^»<" _1 + (-)-^-°- 1/<2 ")) 

= o p (n" 1 A Q + H^A-W). 

We conclude by noting that in the case when mi, . . . , m n are random, m 
can also be replaced with the expectation of the harmonic mean thanks to 
the law of large numbers. □ 

Proof of Proposition 3.3. Let Qt,x be the smoothing spline opera- 
tor, that is, Qt,a(/i; ■ ■ i fm) is the solution to 

^{^g (/i - 9(Ti))2 + "/r l!,W( " 12 *}' 

It is clear that 

gx = Q T ,x(Y.i,Y. 2 ,...,Y. m ) 

and 

X^x = QT,x{Ya,Yi2, . . . ,Y im ). 
Because Qt,x is a linear operator [see, e.g., Wahba (1990)], we have 

1 ~ 1 " 
h = ~^ x i,x = - y^QT,x(Yn,Y i2 , . . . ,Y im ) 

i=i i=i 
= Q T ,x(Y 1 ,Y. 2 ,...,Y. m ) = g x . □ 
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